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Introduction 



Linear statistical analysis, and the least squares method specifically, achieved 
their modern complete form in the language of linear algebra, that is in the 
language of geometry. In this article we will show that multivariate linear 
statistical analysis in the language of geometry can be stated just as beauti- 
fully and clearly. In order to do this, the standard methods of linear algebra 
must be expanded. The first part of this article introduces this generaliza- 
tion of linear algebra. The second part introduces the theory of multivariate 
statistical analysis in the first part's language. We believe that until now 
multivariate statistical analysis, though explained in dozens of textbooks, 
has not had adequate forms of expression. 

Multivariate observations are the observations of several random quanti- 
ties in one random experiment. We shall further record multivariate obser- 
vations as columns. We commonly provide multivariate observations with 
indices. In the simple case, natural numbers serve as the indices. (This 
can be the numbers of observations in the order they were recorded). For 
independent evenly distributed observations this is a fitting way to orga- 
nize information. If the distributions of observation depend on one or more 
factors, the values or combinations of values of these factors can serve as 
indices. Commonly the levels of factors are numbered. In that case the 
index is the set of numbers. So, in a two-factor scheme (classification by 
two traits) pairs of natural numbers serve as indices. 

We shall call the set of observations, provided with indices and so orga- 
nized, an array. 

For theoretical analysis the linear numeration of data is most convenient. 
Further we will be holding to this system. When analyzing examples we 
will return, if needed, to the natural indexing of data. 

In univariate statistical analysis the numeration of data allows recording 
as rows. In the multivariate case the entirety of the enumerated data (that 
is arrays) can also be examined as a row of columns. In many cases (but 
not always) such an array can be treated as a matrix. 

Arrays of one form naturally form a vector space under the operation 
of addition and multiplication by numbers. For the purposes of statistical 
analysis this vector space is given a scalar product. In one dimensional anal- 
ysis, if the observations are independent and have equivalent dispersions, 
then the most fitting scalar product is the euclidean product. In more detail: 
let the observations have an index a; let arrays T x and Ty be composed 
of the one-dimensional elements X a , Y a . Then the euclidean scalar product 
of arrays T x and Ty is 



where the index of summation goes through all possible values. We shall 




(0.1) 



a 
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record multivariate observations as columns. In the multivariate case, the 
elements X a , Y a are columns. For arrays composed of columns, let us accept 
the following definition of the scalar product of arrays Tx and Ty: 

(T x , TV) (0.2) 

a 

The scalar product (I0.2p is a square matrix. Therefore, for arrays com- 
posed of columns, square matrices of the corresponding dimensions must 
play the role of scalars. With the help of the scalar product ( 10.21) and 
its consequences, this article develops a theory of multivariate statistical 
analysis, parallel to existing well-known univariate theory. 

1 Modules of Arrays 

Over a Ring of Matrices 

1.1 Space of Arrays 

In the introduction we agreed to hold to a linear order of indexation for 
simplicity's sake. However, all the introduced theorems need only trivial 
changes to apply to arrays with a different indexation. 
Let us consider a jo-dimensional array with n elements, 

T := {Xi\i = T~n}, (1.1) 

where Xi, X2, . . . , X n are p-dimensional vector-columns. Arrays of this na- 
ture form a linear space with addition and multiplication by numbers. 

1. Addition: 

{Xi\i= T~n} + {Yi\i = = {Xi + Yi I % = T~n}. 

2. Multiplication by numbers: let A be a number; then 

\{Xi I i — l,n} = {XXi I i = l,n}. 

In addition, we will be examining the element-by-element multiplication 
of arrays by square matrices of the appropriate dimensions. 

3. Left Multiplication by a Matrix: let K be a square matrix of 
dimensions p x p. Suppose 

K{Xi I i = = {KX, I % = T^t). (1.2) 

Note that the multiplication of an array by a number can be examined as 
a special case of multiplication by a square matrix. Specifically: multipli- 
cation by the number A is multiplication by the matrix XI, where / is the 
identity matrix of dimensions p x p. 
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4. Right Multiplication by matrices: let Q = \\qij \ i = l,n, j — 
1 , rz. || - a square n by n matrix. Let us define the right multiplication of 
array T ( II. ip by matrix Q as 

n 

{Xi\i = T~n}Q = {J2 x i<Hj | i = T^}. (1.3) 

j=i 

It is clear that the product TQ is defined by the common matrix multipli- 
cation method of "row by column" with the difference that elements of a 
row of T (array T) are not numbers but columns X\, . . . , X n . 

5. Let us define the inner product in array space. For it's properties 
we shall call it the scalar product (or, generalized scalar product). In more 
detail: let 

T = {Xi | % = T~n}, R = {Yi | i = Y/n}. 

Definition 1. The Scalar (generalized scalar) product of arrays T and R 
is defined as 

n 

(T, R) = ^^>f. (1.4) 

i=l 

The result of the product is a square p by p matrix. The scalar product 
is not commutative: 

(R, T) = (T, R) T . 

6. The Scalar square of array 

n 

(T,T) = J2^Xl (1.5) 

i=l 

is a symmetric and non-negatively defined (p x p) matrix. For the represen- 
tation of the scalar square, we shall use the traditional symbol of absolute 
value: (T, T) = |T| 2 . In our case, |T| is the so-called matrix module. [7] 

7. The Properties of the scalar product in array spaces are similar 
to the properties of the traditional scalar product in euclidean vector spaces. 
If Ti, T2, T3 are arrays in general form, then 

(T 1 + T 2 , T 3 ) = (T 1 , T 3 ) + (T 2 , T 3 >; 

(KTi, T 2 ) = K(Tx, T 2 ) where K is a square (p x p) matrix; 

(Ti, Ti) ^ in the sense of the comparison of square symmetrical matrices; 

(T x , T 1 ) = 0iffT 1 = 0. 

8. We say that array T is orthogonal to array R, if 

(T, R) = 0. 
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Note that if (T, R) = 0, then also (R, T) = 0. Therefore the property of 
orthogonality of arrays is reciprocal. The orthogonality of arrays T and R 
shall be denoted as T _L R. 

9. Notice a Pythagorean theorem: if arrays T and R are orthogonal, 
then 

(T + R, T + R) = (T, T) + (R, R). (1.6) 

We note again that the result of a scalar product of two arrays is a (p x p) 
matrix, therefore in array spaces square matrices of corresponding dimen- 
sions should play the role of scalars. In particular, left multiplication by a 
(p x p) matrix shall be understood as multiplication by a scalar, and array 
kT shall be understood as proportional to array T. 

Together with arrays of the form (11 .ip we shall consider one-to-one cor- 
responding matrices 

X = H-Xi, X2, . . . , X n \\. (1-7) 

Matrix ( j 1.7ft is a matrix with p rows and n columns. 

Notation. Matrices with p rows and n columns shall be called (p x n) 
matrices. The set of (pxn) matrices we shall call M^. Matrices of dimensions 
(p x 1) we shall call p-columns, or simply columns. The set of p-columns 
we represent as Rf. Matrices (1 x n) we shall call 72,- rows, or simply rows. 
The set of n-rows we represent as M*. 

Many operations with arrays can be carried out in their matrix forms. 
For instance, the addition of arrays is equivalent to the addition of their 
corresponding matrices; left multiplication by a square (p x p) matrix k is 
equivalent to the matrix product kX; right multiplication by matrix Q is 
equivalent to the matrix product XQ; the scalar product of arrays 

T x = {Xi\i = 17^}, T Y = {Y l \i= 17^} 

is equal to the product of their equivalent matrices X and ^: 

(T x , TV) = Xy T . (1.8) 

We show, for instance, that array TQ corresponds to matrix XQ. Here T is 
the arbitrary array of form (II. ip and X is the corresponding (p x n) matrix 
(II. 7p . Let Q = {q aj 3 \ a, f3 = l,n} be a (n x n) matrix (with numerical 
elements q a p) . 

Proposition 1. Matrix XQ corresponds to array TQ. 

Proof. Elements of array T, being columns of matrix X, must be represented 
in detailed notation. Let 

(*^l? > "^2j> ■ ■ • 1 %pj) 1 j 1) 71. 
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In this notation, 



%Q = II 7, XjjQjk | * = A; = (1.9) 



The array 



Ty = {y fe |*: = l,7i}, 
corresponds to matrix XQ where 

Y k = (yik,V2k, ■ ■ -,y P k) T , 

and 



n 



Vik ^7 -EijQjki 
3=1 

by <\1M . Array TQ, by definition ( 11.31) . is equal to 

n 

TQ = {Xi \ i = 1, n}Q = Xjq kj \ k = 1, n} = {Z k \ k = 1, n}, 

i=i 

where p-row 

n n 
Z fc = ^ = / ji ■ ■ ■ j X p j) T q k j = 

i=i i=i 

X T (1-10) 

x ljQkj, X2j<ikj-, • • • i Xpjqkj 

o'=i i=i j=l 

Comparing expressions (11.91) and (11.101) . we see the equality of their ele- 
ments. □ 

Thus in a tensor product ® IR* we introduced the structure of a 
module over the ring of square matrices supplied with an inner product, 
which we called a scalar product. 

1.2 Linear Transformations 

Many concepts of classical linear algebra transfer to array space almost 
automatically, with the natural expansion of the field of scalars to the ring 
of square matrices. For instance, the transformation /(•) of array space (11.11) 
onto itself is called linear if for any array Ti and T2 and for any (p x p) 
matrix k± and k 2 

f{K l T 1 + K 2 T 2 ) = tfi/(T0 + K 2 f(T 2 ). (1.11) 
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Linear transformations in array space are performed by right multiplication 
by square matrices. Let Q be an arbitrary (n x n) matrix, T be an arbitrary 
array (11.11) . That transformation 

/(T) = TQ 

is linear in the sense of (11. lip , directly follows from the definition ( 11. 3ft . 
That there are no other linear transformations follows from their absence 
even in the case p — 1. (As we know, all linear transformations in vector 
spaces of rows are performed by right multiplication by square (n x n) 
matrices.) 

Note that the matrix form ( 11. 7ft of representing an array is fitting also 
for the representation of linear transformations: matrix XQ (the product of 
matrices X and Q) coincides with the matrix form of an array (11.31) 

n 

TQ = {Xi \ i = T^}Q = {J2 XjQa I i = hn}- 

j=i 

We shall call a linear transformation of array space onto itself orthogonal 
if this transformation preserves the scalar product. It means that for any 
arrays T and R 

(TQ, RQ> = (T, R). 

It is easy to see that orthogonal transformations are performed by right 
multiplication by orthogonal matrices. Indeed, 

n / n \ / n 

(TQ, RQ) = [J2 X M E 1 ^ 

i=i \i=i / \/=i 

Tb 71 I 71 \ 7X 

j = l q=l \i=l J j = l 

since matrix Q is orthogonal and therefore 

n 

'^^QijQii = Sji (Kronecker symbol). 
i=l 

1.3 Generating Bases and Coordinates 

Let a G M^,x e M^,ax G K£. Here ax denotes the product of matrices a 
and x. The matrices of form ax plays a special role in array spaces. 

Let n-rows ei,e2,...,e n G form the basis of the space IR*. Let 
cei, «2, • • • , OL n G be arbitrary p-columns. Let us consider (p x n)-matrices 
ct^ei, a 2 e 2 , • • • , a n e„. 
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Theorem 1. Any (p x n) -matrix X (II. 7p can be represented as 



X = ^ j a i e i (1.12) 

i=i 

for some choice of aci, a 2 , • • • , ot n G Rf uniquely. 

Proof. Let us define (n x n)-matrix i? formed by n-rows ei, e 2 , . . . , e n . Let 
us also introduce a (p x n)-matrix A formed by p-columns «i, a 2 , . . . , a n . 
With matrices A and E the sum f ll . 12j) can be represented as 

n 

} J Oi-iti = AE. 
i=i 

Here are some calculations to confirm that assertion. Let aij = (an, a 2 i, ■ ■ ■ , a pi ) T , = 

ifiil i &i2 j • • • ; ^in) • 

ra n 

2J = 2J(o!ii, ct 2i , ■ ■ • , a P i) T (e;i, e i2 , • • • , e in ). 
j=i i=i 

The element at (fc, /)-position of each product c^e,, i = 1, . . . , n, is in essence 
a-kiCii- Their total sum, which is the element of matrix Xir=i a * e «' * s Sl=i a ki e a- 

The element at (A;, Z)-position of matrix AE (calculated by the row by 
column rule) is 

n 
i=l 

The calculated results are equal. 

The theorem shall be proven if we show that the equation 

X = AE (1.13) 

has a unique solution relative to the (p x n)-matrix A. Since the (n x n)- 
matrix E is invertible, the solution is obvious: 

A = XE~ 1 . (1.14) 

□ 

The theorem allows us to say that the basis of R* generates the space R^ 
(using the above method). Thus, the bases in R* shall be called generating 
bases in relation to R£. The p-columns a\, a?, . . . , a n from f)1.12p can be 
understood as the coordinates of X in the generating basis ei, e 2 , . . . , e n . 
For the canonical basis of the space R* (where e, is an n-row, in which the 
zth element is one, and the others are zero) coordinates X relative to this 
basis are p-columns X%, . . . , X n e R^, which form the matrix X. 
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The coordinates of the (p x n)-matrix X in two different generating bases 
are connected by a linear transformation. For example, let n-rows /i, . . . , f n 
form the basis in R*. Let F be an (n x n)-matrix composed of these n-rows. 
By Theorem [T] there exists a unique set of p-columns /? 2 , ■ ■ ■ , /3 n that 
are coordinates of X relative to the generating basis fi, . . . , f n . Matrices 
B = | . . . , j3 n \ | and F are connected to the (p x n)-matrix X by the 
equivalence 

X = BF. (1.15) 

With ffTTT3l) this gives 

BF = AE. 

Therefore, 

B = AEF~\ A = BFE- 1 . 

Corollary 1. // the generating bases e\, . . . , e n and fi, ■ ■ ■ , f n ore orthogo- 
nal, then the transformation of the coordinates of an array in one basis to 
the coordinates of it in another is performed through multiplication by an 
orthogonal matrix. 

Let us consider an arbitrary orthogonal basis ex, . . . , e n in R£. For arbi- 
trary (p x n)-matrices X and ^ we have the decompositions of (11.121) with 
respect to this basis: 

n n 

X = ^ V = ^7iei- 

i=l »=1 

We can express the scalar product of X and y through their coordinates. It 
is easy to see that 

n 

(T x , T y ) = Xy T = ^a i7 f- (1-16) 

i=l 

Corollary 2. In an orthogonal basis, the scalar product of two arrays is 
equal to the sum of the paired product of the coordinates. 

Proof. Indeed, 

n n n n n 

i=l j=l i=l j=l i=l 

since for the orthogonal basis ejej = 5ij. □ 
Therefore the scalar square of Tx equals 

\T X \ 2 = XX T = J2a t af. 

i=l 

We can conclude from here that the squared length of an array is equal to 
the sum of its squared coordinates in an orthogonal basis, as for the squared 
euclidean length of a vector. 
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1.4 Submodules 



We define a Submodule in array space (11. ip (or the space of corresponding 
matrices (II. 7p ) to be a set which is closed under linear operations: addition 
and multiplication by scalars. Remember that multiplication by scalars 
means left multiplication by (p x p)-matrices. For clarity, we shall discuss 
arrays in their matrix forms in future. 

Definition 2. The set C C we shall define to be the submodule of space 
M?, if for any Xi, X 2 G C 

K1X1 + K 2 X 2 G C (1.17) 
with arbitrary (p x p) -matrices K\, K 2 . 

Theorem 2. Any submodule C, £ C is formed by some linearly inde- 
pendent system ofn-rows. The number of elements in this system is uniquely 
determined by C This number may be called the dimension of the linear 
subspace C 

Proof. Let X G C The set of (p x n)-matrices of the form KX (where K 
is an arbitrary (p x p)-matrix) forms a submodule. Let us label it as £(X) 
Let Xi, . . . , Xpbe n-rows of the (p x n)-matrix X. Let us choose from among 
these n-rows a maximal linear independent subsystem, such as yi,...,yk- 
It is obvious that 

k 

C(X) = {V |y = 5>yi, /3i,...,fteR?}. 
1=1 

If £(X) = £, then yx, . . . , form a generating basis for C C M?. If £(X) 7^ 
then let us find in C an element, say Z, that does not belong to £(X). Let 
us expand the system y±, . . . , y^ with n-rows Zi, . . . , z p of (p x n)-matrix 
Z. Then we find in this set of n-rows the maximal linearly independent 
subsystem, and repeat. At some point the process ends. □ 

Corollary 3. Any submodule £clJJ can be expressed as the sum of one- 
dimensional submodules Ci C M^: 

C = d® C 2 ® C h (1.18) 

where 

Ci = {X I X = ayi, a G M^} 

for some yi G K^. TTie number I is the same in any representation (11.181) 
of C. This number can be called the dimension of subspace C: I = dim£. 
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Note. One can choose an orthogonal linearly independent system of 
n-rows that generates £. For proof, it is sufficient to note that the gener- 
ating system can be transformed into an orthogonal one by the process of 
orthogonalization. 

Theorem [2] establishes the one-to-one correspondence between linear 
subspaces of vector space R^ and the submodules of the matrix space M^. 
Let us state this as 

Corollary 4. Each linear subspace L in the space of n-rows R* corresponds 
to some submodule £ in the space of (p x n) -matrices R^. The dimensions 
of the linear subspace L and the submodule £ coincide. 

In this manner, the space R^ (and the corresponding array space) and 
the space R^ have an equal "supply" of linear subspaces and submodules. 
This leads to significant consequences for multivariate statistical analysis. 

Definition 3. An orthogonal compliment of the submodule £ with respect 
to the whole space is said to be 

£ ± = {X\XeR p n , (X, y) = 0, e£}. (1.19) 

s It is easy to see that £ is a submodule and that 

C®£ L = B*, dim£ x = n-dim£. 

1.5 Projections onto Submodules 

Let us consider array space ( 11. II) with the introduced scalar product ( 11.41) . 
Let £ be the submodule ( 11.171) . Let us call the projection of array T onto a 
linear subspace £ the point of £ that is closest to T in the sense of comparing 
scalar squares (11.51) . 

Let us say it in details. Let array R pass through the set £. We shall 
call the point R° G £ closest to T if for any R G £ 

(T - R°, T - R°) ^ (T - R, T - R). 

Note that (T — R, T — R) is the function of R with values in the set of 
ijp x jo)-matrices. The existence of a minimal element in the set of matrices 
(generated by R G £) is not obvious and is not provided naturally. So the 
existence of proj £ T needs to be proved. We state this result in the following 
theorem. 

Theorem 3. The projection of T onto £ exists, is unique, and has the 
expected (euclidean) properties. 
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1. For any array Rg£, 

|T-R| 2 V |T-proj £ T| 2 , 
with equality iff It = proj £ T; 

2. (T-proj £ T) ±C; 

3. proj £ (ir 1 T 1 + K 2 T 2 ) = K x proj £ Tj + K 2 proj £ T 2 . 

Proof. Let X 6 1JJ be an arbitrary (p x n)-matrix. As was shown, any 
submodule £ C R^ is equivalent to a linear subspace L in the space of 
n-rows, L C R* . Let IT be a projection matrix onto L in the space R* , that 
is, for any x G R* 

proj^ x = xLT. 

To prove the theorem we need the following Lemma [T] and Theorem HI 

Lemma 1. Let C C R^ fre a submodule in the space of (p x n) -matrices, 
and let L C R* 6e a linear subspace in the space of n-rows which generates 
C Then for any A G 

A T £ = L. 

Proof of Lemma. Let r = dim L,r < n. Let us choose within L the basis 
ei, . . . , e r . As we know, the subspace C G R£ can be represented as 

r 

£ = {y | y = 2J "fcefc, Qfi, . . . , Q!fe G Ri}. 

fe=l 

Let V G £, then for some otx, . . . , ot^ G Rf 

r 

fe=i 

Therefore, under any A G 

r 

A T y = ]T(A T a fc K G L, 

k=l 

since A T «i, . . . , \ T a r are numerical coefficients. □ 

Theorem 4. Lei C be a submodule in the space of (p x n) -matrices. Let L 
be a linear subspace in the space of n-rows which generates £cKJ. Let IT 
be a projection (n x n) -matrix onto L, that is, for any x G R* 

proj L x = xll. 

Then for any X G R^ 

proj £ X = xn. 
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Proof. We must show that for any ^ G £ 

(x-y, x-y) ^ (x-xn, x-xn) (1.20) 

with equality if and only if ^ = XII. The inequality between two symmet- 
rical (p x p)-matrices in (11.201) means that for any AelJ 

A r (x-y, x-y)A >A T (x-xn, x-xn)A, 

thus 

|A T (x-y)| 2 > |A T (x-xn)| 2 , 

thus 

|A T X - A T y| 2 > |A T X - (A T X)n| 2 . 

As was noted above, the n-row y = A T y belongs to L and A T XI1 = xll is 
a projection of A T X onto L. Due to the properties of euclidean projection, 
we get for any y G L 

\x — y\ 2 > \x — xll\ 2 

with equality if and only if y = xll. Thus, XII is the nearest point to X in 
C. □ 

Now we return to proving Theorem [31 From Theorem H] we know XII is 
the unique projection of X onto C So, statement 1 of Theorem [3] is proven. 

The explicit expression proj £ X = XII confirms that the operation of 
projection onto a submodule is a linear operation. So, statement 3 of The- 
orem [3] is proven as well. 

To complete the proof of Theorem [3] we need to show statement 2. Let 
ei, . . . , e r be an orthogonal basis of L and e r+ i, . . . , e n be an orthogonal basis 
of L- 1 . Then, ej, . . . , e n is the orthogonal basis of R*. In this orthogonal 
basis, if 



n 



then 



Since ^ G £, 



x-xn= 



oiiei. 

i=r+l 



i=l 



for some fli, . . . , f3 r G Mf. Therefore: 

n r n r 

(X- xn)y T = ( ^2 a i e 'h ^Pi^i) = ^ y^A a i e i> be 3 e i) 

i=r+l i=l i=r+l j=l 



n r 



^2 ^aie i (/5 i e i ) T = ^ ^ a^ej fff = 0, 

i=r+l j=l i=r+l j=l 
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since ejej = when i ^ j. 



□ 



1.6 Matrix Least Squares Method 

Calculating projections onto a submodule £ C become easier if the form 
of projection onto the linear subspace L which generates £ is known. By 
the lemma from Section 11.51 for any A 6 Rj 

A T proj £ X = proj L (A T X). (1.21) 

Assume that for the right part of (jl.2ip we have an explicit formula y = 
proj L x. Then because of the linearity this gives us for proj£(A T X) an 
explicit formula X T ^. Therefore 

X T proj £ X = A T y . (1.22) 

So we get an explicit expression for proj £ X. One can say this is the calcu- 
lation of proj £ X by Roy's method. [5] 

Example: calculating the arithmetic mean. Let X\, X 2 , . ■ . , X n 

be the set of j>-columns. Let us consider the array T = {Xi \ i = l,n} and 
represent it in matrix form. 

X = \\Xi, X2, ■ ■ ■ , X n \\ . (1-23) 

Our task is to find the array ^ with identical columns, i.e., an array of form 

y = \\Y,Y,...,Y\\, FGl?, (1.24) 

closest to (11.231) . Arrays of form (11.241) produce a one- dimensional submod- 
ule. We shall denote it by £, £ C R£. We have to find proj £ X. The 
submodule £ is generated by a one-dimensional linear subspace L, L C M^, 
spanned by n-row e = (1,1,..., 1). 

Let x be an arbitrary n-row, x = (x\, . . . , x n ). The form of projection 
of x onto L is well known: 

proj L x = (»,.. -,x). 

Applying Roy's method to matrix X fl 1 . 2 3 [) we get over to n-row x = A T X, 
where = X T Xi, i — 1, n. It is then clear that 

proj L x= (X T X,...,X T X). 

Therefore, 

proj £ X=(X,...,X). (1.25) 
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Of course, this is not the only and not always the most efficient method. In 
this example, like in other cases, one can apply the matrix method of least 
squares and find 



Y = arg min ^(X, - - Y) T . (1.26) 

1 i=l 

Solution. Let us transform the function in (11. 26ft : for any FgIj 

n n 

Y,(X i -Y)(X i -Y) T = Y}{X l -X) + {X-Y)][{X t -X) + (X-Y)] T = 

i=\ i=l 

n n 

= - - X ) T + - - y ) T = C 1 ) + ( 2 )' ( L27 ) 

i=l i=l 

since "paired products" turn to zero: 

n n 

-X)(X- Y) T = 0, ^(X - Y)(Xt - X'f = 0. 

i=l 1=1 

Now the function (II. 27ft is a sum of two nonnegatively defined matrices, and 
the first one does not depend on Y . The minimum attains at Y = X: at 
that point the nonnegatively definite matrix (2) turns to zero. 
The answer is an arithmetic mean, that is, 

Y = X. 

Of course, it is well known. It can be find by applying not the matrix but 
the ordinary method of least squares: 

n 

Y = arg min J^pQ - Y) T (Xi - Y). 

1 i=l 

The results of the matrix method similarly relate to the traditional in 
the case of projection on other submodules £ C R£. The reason is simple: 
if an array ^ is the solution of a matrix problem 



^2(Xi - Zi)(Xi - Z t ) T = (X - Z)(X - Zf -> min, 
i=i 

then Y is a solution of the scalar problem as well, 

n n 

tr{^(X, - Zi)(Xi - Z.f} = J2(*i - ZifiXi - Zi) ^ min . 
i=i i=i 
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Thus, for instance, in calculating the projection on submodulqes one can use 
the traditional scalar method of least squares. Both least squares methods 
in linear models give us the same estimates of parameters. The necessity 
of matrix scalar products and the matrix form of orthogonality, projection, 
submodules, etc becomes apparent in testing linear hypothesis. We shall 
relate this in the next section. 

2 Multivariate Linear Models 
2.1 Arrays with Random Elements 

Let us consider array (12. ip . the elements of which are p-dimensional random 
variables presented in the form of p-columns. 

T = {X i \i=T~^}. (2.1) 

Remember that we treat such an array as a row composed of p-columns 
under algebraic operations. For arrays of form (12. ip with random elements, 
let us define mathematical expectation and covariance. The array 

ET x = {EX l \i = T^i} (2.2) 

is called the mathematical expectation of T. We define the covariance matrix 
of array (12. ip much like the covariance matrix of random vector. Let 

t {x\ , . . . , X n ) 

be an n-row composed of random variables X\, X2, ■ ■ ■ , x n . As we know, the 
covariance matrix Vart of the random vector t is an (n x n)-matrix with 
elements 

o~ij = Cov(xi,Xj), where i, j = l,n. 

Algebraically, with the help of matrix operations, the covariance matrix of 
the random vector t can be defined as 

Vart = E(t-Et) T (t-Et). (2.3) 

Following (12.31) . we define the covariance array of random array (12.11) as 

VarT := E (T — E T) T (T — E T) = {Covpf;,^) | i,j = l~n}. (2.4) 

Here Cov(Aj,X,) is a covariance matrix of random column- vectors X, and 
Xj, 

Cov(Xi, Xj) = E (Xi — EXi)(Xj - El/. (2.5) 

Note that we consider VarT (12.41) as a square array of dimensions (n x n), 
the elements of which are (p x p)- matrices (12.51) . 
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Let us consider the array R, obtained by the linear transformation of 
array T (123]) 

R = TQ, (2.6) 

where Q is a (n x n)-matrix. 
It is clear that 

ER = (ET)Q, 

VarR = E[(TQ-ETQ) T (TQ-ETQ)] = Q T (VarT)Q. (2.7) 

In mathematical statistics, arrays with statistically independent random 
elements are of especial interest when the covariance matrices of these ele- 
ments are the same. Let T (12.1 p be an array such that 

Cov(X,, X,) = <%£, i, j = T~H. (2.8) 

Here £ is a nonnegatively defined (p x p)-matrix and 5ij is the symbol of 
Kronecker. Let us consider an orthogonal transformation of array T: 

R = TC, (2.9) 

where C is an orthogonal (n x n)-matrix. The following lemma is fairly 
simple but important. 

Lemma 2. 

Var R = Var T = {<%£ \i,j = T~n} (2.10) 

This lemma generalizes for the multivariate case the well-known property 
of spherical normal distributions. 

Proof. The proof of the lemma is straightforward. To simplify the formulas, 
assume that ET = 0. Then, (12"77|) . 

E (TC) = E [(TCf(TC)] = C T (Var T)C = 

= C T {<%£ | i,j = L7n}C = {(J y E | i, j = l,n}. 

□ 

Earlier, while discussing generating bases and coordinates (Section 1 1.3jl . 
we established that the transformation from the coordinates of array T 
in an orthogonal basis to coordinates of this array in another basis can 
be done through multiplication by an orthogonal matrix. Therefore if the 
coordinates of some array in one orthogonal basis are not correlated and 
have a common covariance matrix, then the coordinates of the given array 
hold these properties in any orthogonal basis. From the remark above and 
just established Lemma [2] follows 
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Lemma 3. If the coordinates of a random array in an orthogonal basis are 
uncorrelated and have a common covariance matrix, then the coordinates of 
this array are uncorrelated and have the same common covariance in any 
orthogonal basis. 

This property is of great importance in studying linear statistical models. 

Finally, let us note that in introducing and discussing covariance arrays 
of random arrays we have to work with the arrays themselves (11.10 and not 
with the matrices (11.71) representing them. 

2.2 Linear Models and Linear Hypotheses 

Definition 4. One says that array T ( 12.11) with random elements obeys a 
linear model if 

a) for some given submodule £, 

ET EC; (2.11) 

b) elements X\, . . . , X n of array T are independent and identically dis- 
tributed. 

If this is common for all Xi, with i = l,na gaussian distribution, then 
we say that array T follows a linear gaussian model. We will now study 
linear gaussian models. 

We shall denote with E the common covariance matrix for all p-columns. 
The array E T and matrix £ are parameters of the model. They are generally 
unknown; although, £ is assumed to be nondegenerate. 

For random arrays following the gaussian model, linear hypotheses are 
often discussed. Within the framework of the linear model (I2.1ip the linear 
hypothesis holds the form: 

ET e Ci, (2.12) 

where C\ is a given submodule, and L\ C C. 

Let us show that the linear models and linear hypotheses discussed in 
multivariate statistical analysis have the structure of (12 . 1 1 [) and (12.121) . The 
main linear models are factor and regression. For example, let us consider 
the one-way layout and regression models of multivariate statistical analysis. 

The One-way layout model is the simplest of the "analysis of vari- 
ance" models. It is a shift problem of several (say, m) normal samples with 
identical covariance matrices. The array of observations in this problem has 
to have double numeration: 

T = {Xij | j = T~m, i = (2.13) 
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Here m is the number of different levels of the factor, which affects the 
expected values of the response. Here, rij is the number of independently 
repeated observations of the response on the level j of the factor, j = 
l,m. Finally, multivariate variables are independent realizations of a 
p-dimensional response, Xy e R^. Assume N = ri\ + ■ ■ ■ + n m . The main 
assumption of the model is: ~ N p (a,j, E). 

We shall linearly order the observations which constitute the array (12.131) 
and then represent (12 . 13[) as a (p x iV)-matrix. 



X 



|X 1:L , Ai2, . . . , X lni , X 



21, 



A2n 2 , X, 



ml ) 



x„ 



Note that 

E X = || Qj, . . . , g^ , Q2 , ■ ■■ , G^ , • • ■ | Qmi ■ — ; O-rn 
n\ times 712 times n m times 

Let us introduce iV-rows 

e 1 = (l i _^l,0,...,0), 

ni times 

e 2 = (0 1 _^,l,...,l,0,...,0), 

ni times ri2 times 



(2.14) 



(2.15) 



(2.16) 



e m = (o 1 __o, o 1 _^ I o, . . . , i 1 _^r ) . 

Tii times rt2 times ra m times 



It is obvious that 



i=l 



Therefore EX belongs to an m dimensional submodule of the space ~M? N 
spanned by n-rows (12.161) . 



The hypothesis Hq : ai = a 2 



with which one usually begins 



the statistical analysis of m samples, is obviously a linear hypothqesis in 
the sense of (12. 12f) Hq : EX G C%, where C± is a one dimensional linear 
subspace spanned by the single X-row e = e\ + • — h e m . 
Multivariate Multiple Regression in matrix form is 



(2.17) 



where ^ = ||Yi, Y 2 , . . . , Y n \\. Here y is theq observed {p x n)-matrix of 
p-dimensional response; X is a given design (m x n)-matrix; A is a (p x m)- 
matrix of unknown regression coefficients; £ = \\Ei, E 2 , . . . , E n \\ is a (p x n)- 
matrix composed of independent p-variate random errors Ei, E 2 , . . . , E n . In 
gaussian models 

Ei~N p (0,E), 
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where (p x p) matrix £ is assumed to be non-degenerate. Generally £ is 
believed to be unknown. 

Let Ai, A 2 , ■ ■ ■ , A m be the p-columns forming matrix A; let xi, X2, • • • , x m 
be n-rows, forming matrix X. Then 

rn 

AX = ^A i x i . (2.18) 

i=i 

The resulting expression ( 12. 18ft shows that = AX belongs to an m 
dimensional sub module of the space generated by the linear system of 
n-rows xi,X2, ■ ■ ■ , x m . 

2.3 Sufficient Statistics and Best Unbiased Estimates 

Let us consider a linear gaussian model (12.111) in matrix form 

X = M + £. (2.19) 

where M = E X is an unknown (p x n)-matrix; 

M= ||Mi,M 2 ,...,M n || G C, 

where £ is a submodule of R£; 

£ = \\Ei, E 2 , . . . , E n \\ 

is a (p x n)-matrix, the p-columns E\, E 2 , . . . , E n of which are the indepen- 
dent N p (0, S) random variables. 

The unknown parameter of this gaussian model is a pair (M, S) . Let us 
find sufficient statistics for this parameter using the factorization criterion. 

A likelihood of the pair (M, E) based on X is 

n ( 1 \ v 1 1 

/ 1 \ np f 1 \ n 1 n 

= W ivm) exp{-- trE -.[E(X,-M,)(X,-M,r]}. 

(2.20) 

The sum in square brackets is (X — M, X — M). Let us represent X — M 

as 

X - M = (X - proj £ X) + (proj £ X - M) = (1) + (2) 
and note that 

(1) = proj £ xX G C ± , (2) G C. 
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Thus, (Pythagorean Theorem) 

(X — M, X — M) = (proj £ ±X, proj £ ±X) + (proj £ X — M, proj £ X — M). 

" "~ (2-21) 

We conclude that the likelihood (12.201) is expressed through the statistics 
proj £ X and (proj £ ±X, proj £ ±X), which are sufficient for M, S. 

The statistic proj^X is obviously an unbiased estimate of M. As a 
function of sufficient statistics it is the best unbiased estimate of M. We 
can show that the best unbiased estimate of E is the statistic 

-^-£j(proj £ xX, proj £ xX) (2.22) 
after proving the following theorem [5j 

2.4 Theorem of Orthogonal Decomposition 

Theorem 5. Let X = \\Xi, X2, ■ ■ ■ , X n \\ be a gaussian (p x n) matrix with 
independent p-columns X\,X2, ■ ■ ■ ,X n e R^, and VarXj = £ for all i = 
1, . . . , n. Let Ci, £2, ■ ■ ■ be pairwise orthogonal submodules W^, the direct 
sum of which forms : 

W n = d © C 2 ® . . . 

Let us consider the decomposition of (p x n) -matrix X into the sum of or- 
thogonal projections X on the submodules £±, £2, ■ ■ ■ : 

X = proj £l X + proj £a X + . . . 

Then: 

a) random (p x n)-matrices proj^ X, proj £ X, . . . are independent, nor- 
mally distributed, and Eproj £ .X = proj^.EX; 

b) (proj^.X, proj £ .X) = W p (dim.Li, E, Aj), where W p (v, E, A) indicates 
a random matrix ( of size (p x p) ), distributed under Wishart, with v 
degrees of freedom and the parameter of non-centrality A. In this case 

Aj = (proj^EX, proj^EX). 

Proof. Each submodule £ C R£ has a one-to-one correspondence to some 
linear subspace LcKj, which generates it, and dim£ = dimL. Let sub- 
modules £1, £2, ■ ■ ■ ClJJ correspond to the subspaces Li,L 2 ,--- C R*. The 
subspaces Li,L 2 ,--- C R* are pairwise orthogonal, and their direct sum 
forms the entire space R*. Let us denote the dimensions of submodules 
£1, £2, ••■ CR^ (and subspaces L 1; L 2 , ■ ■ ■ C R*) by m 1 ,m 2 ,.... 
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Let us choose in every subspace Li, L 2 , . . . an orthogonal basis. For C\ 
let it be the n-rows f h . . . , f mi ; for £ 2 , the n-rows f mi +i, f mi+m2 etc. 
With the help of these n-rows each of the submodules £i,£ 2 , • • • can be 
represented as the direct sum of one dimensional submodules from R^. For 
example, L\ = T\ © T 2 © • • • © Fmn where 

^2 = {V I V = a/ a , a £ R?}, 

^■ rai = {y|y = a/ mi , «gr?}. 

The set of all n-rows fi, f 2 , ■ ■ ■ , fn forms an orthogonal basis in R* and so 
does the generating basis in R£. Therefore any (p x n) -matrix X G R^ can 
be represented in the form 

n 

X = ^ ] ^/j; 
i=l 

where Yi, . . . , Y n are some ^-columns, that is Yx, . . . , Y n G Rf , and 

mi 

i=i 

P r °j£ 2 ^ = ^ etC ' 

i=mi+l 

Here p-columns Yi, Y2, . . . , Y n are coordinates of a (p x n)-matrix X relative 
to the generating basis fi, ■ ■ ■ , f n , while the p-columns Xi, X 2 , . . . , X n are 
coordinates of the same (px n) -matrix X relative to the orthogonal canonical 
basis R\: e x = (1,0,...), e 2 = (0,1,0,...) etc. As was noted earlier (see 
Lemma [3]), the transformation from some coordinates to others is performed 
through the right multiplication of an (p x n)-matrix X by some orthogonal 
transformation (n x n)-matrix, say by {n x n)-matrix C: 

||Yi, Y 2 , . . . , Y n \\ = \\Xi, X 2 , ■ ■ ■ , X n \\C, or y = XC. 

Thus the p-columns Y±, . . . , Y n are mutually normally distributed. Following 
Lemma [3j 

VarV = VarX = | i,j — l,n}. 

This means that Yi, . . . , Y n are independent gaussian p-columns with com- 
mon covariance matrix E, just like the p-columns X\, . . . , X n . 
Let us consider random (p x p) -matrices 

(proj £l X, proj £l X), (proj £2 X, proj^X), . . . 
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For example, 

mi 

(proj £l X, P r °jri X ) = ^ Y i Y ?- 

i=i 

The distribution of such random matrices is called a Wishart distribution. 
If E Yi = EY 2 = ■ ■ ■ = EY mi = 0, we get the so-called central Wishart 
distribution W p (mi, E). Let us note that if one uses the notation W p (m, E) 
for a random matrix itself, not only for its distribution, then one can say 
that 

Wp(m,E) = YPW p (m, /)£*, 

if one represents as E^ a symmetric matrix, the unique symmetric solution 
of the matrix equation: Z 2 = E. 

One says that a random (p x jo)-matrix W has the noncentral Wishart 
distribution if 

rrt 

^ = ^(6 + a l )(6 + a l ) T , 
i=i 

where the p-columns £i, £2, • • • , £m are hd N p (0, E), cti, a 2 , . . . , a m are some 
nonrandom jo-columns, generally distinct from zero. The distribution W 
somehow depends on the p-columns Oi,a2, . . . ,a m . Let us show that the 
distribution W depends on the noted p-columns through a so-called param- 
eter of noncentrality: the (p x p)-matrix 

m 
i=l 

Let us introduce the (p x m)-matrices 

£ = ||£lj£2, • • • ,£m||, 
yi = || CXi , 02, • • • , a m || . 

In these notations 

w = (£ + a £+yi). 

Let C be an arbitrary orthogonal (m x m)-matrix. Say 77 = £C. Note 
that i] = and 

W = (?7 + AC, 77 + AC). 

We see that the noncentral Wishart distribution depends on .A = \\ai, . . . , a m 
not directly but through the maximal invariant A under orthogonal trans- 
formations, that is through (A, A) = YlT=i a i a f- 
Therefore, in the general case 

(proj £i X, proj £ .X) = W p (m^ E, A;), 

where A, = (proj^.EX, proj^.EX). □ 
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Let us return to the unbiased estimate of parameter £ of linear models. 
In linear model (I2.19p proj £ ±EX = 0. Therefore the statistic (12.221) is 

1 lii 
(proj £ ±X, proj £ ±X) = £ 2 W p (n — m, J)E 2 . 



dim C L n — m 

It is obvious that its expected value is S. 

2.5 Testing Linear Hypotheses 

Copying the univariate linear model, we shall define the hypothesis in the 
multivariate linear model (I2.19P as 

H:EXed, (2.23) 

where Ci is a given submodule such that C\ C C 

In this section we will propose statistics which may serve as the base for 
the construction of statistical criteria for testing H (I2.23I) . free (under H) 
from the parameters M, S. 

Let us introduce the submodule C 2 which is an orthogonal complement 
L\ with respect to C: 

C = C 1 ®C 2 . (2.24) 

Let us consider the decomposition of the space into three pairwise or- 
thogonal subspaces: 

W n = £ 1 ®£ 2 ®£ ± . 
Following theorem [5] the random matrices 

Si := (proj £ ±X, proj £ ±X) and S 2 := (proj £a X, proj £2 X) 

are independent and have Wishart distributions. Regardless of H 

Si = (proj £ ±X, proj £ ±X) = W p {n — m, E). (2.25) 

If the hypothesis H (I2.23I) is true, then 

5*2 = (proj £a X, proj £2 X) = W p (m 2 ,T). (2.26) 

(Here and further we denote m = dim£, mi = dim£ 1; m 2 = dim£ 2 )- 

Under the alternative to H (I2.23I) . the Wishart distribution of statistic 
(I2.26I) becomes noncentral with the parameter of noncentrality 

A = (proj £2 EX, proj £2 EX). 

The noncentrality parameter shows the degree of violation of the hypothesis 
H (J2Z2U): EX G Ci. 
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In the one-dimensional case (when p = 1) the statistics (12.251) and (12.261) 
turn into random variables distributed as a 2 x 1 {n — m) and a 2 x 2 {m2) respec- 
tively. Their ratio (under the hypothesis) is distributed free, and therefore 
it can be used as a statistical criterion for testing H. This is the well-known 
F-ratio of Fischer. 

In the multivariate case the analogue of F-ratio should be the "ratio" 
of (p x p)-matrices S 2 and Si. Under n — m > p the matrix Si (I2.25P is 
non-degenerate, and therefore there exists a statistic ((p x p)-matrix) 

(P r oj £2 X, P r oj £2 X) (proj £± X, proj £ xX) _1 (2.27) 

Unlike the one-dimensional case (p = 1) the statistic (I2.27P is not dis- 
tributed free. By distribution, ( 12.271) is equal to 

W p (m 2 , 1) W-\n - m, I) E"i (2.28) 

However the eigenvalues of matrix (12.271) under the hypothesis H (12.231) 
are distributed free (from M, S). These eigenvalues coincide with the roots 
of the equation relative to A 

det(Wp(m 2) 7) - XW p (n - m, I)) = 0. (2.29) 

Therefore certain functions of the roots of equation ( 12.291) are traditionally 
used as critical statistics in testing linear hypotheses. 

Here our investigation enters the traditional realm of multivariate sta- 
tistical analysis, and therefore must be finished. 
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